Building and Interrogating Knowledge Graphs in Natural Language

摘要:Building knowledge graphs from text and querying them in natural language has always been an attractive idea, as avoiding the need for an expert to mediate between naive users and standard data-base and knowledge-graph knowledge representation languages such as SQL and SPARQL. The problem has always been that there is no usable semantics for natural language, nor any inference system based on it that will in practise get us from the text or the corresponding knowledge graph to an answer to the question. There are two contemporary approaches to this problem. Themachine-reading approach seeks, by parsing large amounts of unlabeled multiply-sourced text, to extract natural language predicates grounded in n-tuples of linked named entities, forming a large Knowledge Graph whose nodes are the entities and whose arcs are the predications. Directed meaning postulates over those predicates are then extracted form the Knowledge Graph, based on distributional inclusion between the sets of entity tuples for each pair of predicates, to form a second Entailment Graph. The alternative approach treats vector embeddings, pre-trained on even larger amounts of text, as embodying a latent form of entailment graph, revealed using supervised fine-tuning of a large contextualized Language Model such as BERT and its descendants over corpora consisting of examples of entailment. The talk will report progress and results from building a large entailment graph and using it to interrogate knowledge graphs built with the same predicate relations. It will also compare it with recent results from the alternative language-model based approach.


简历:Mark Steedman is a Professor in the School of Informatics at the University of Edinburgh. His research is at the interdisciplinary interface of computer science and theoretical psychology in natural language processing (NLP) and Artificial Intelligence (AI). It proceeds from the conviction that language and cognition are inherently computational. His achievements include: advancing the theory of grammar; robust wide-coverage statistical semantic parsing; combined logical and distributional semantics; temporal semantics; and the structure and meaning of intonation in speech. He has pioneered the application of NLP methods to the analysis of music, and the use of AI models in understanding their common evolutionary origin. His most widely recognised invention is Combinatory Categorial Grammar (CCG), a computationally practical theory of natural language grammar and processing (Steedman 1985b, 1987a, 1996a, 2000a, 2012a). This work has been recognized in its linguistic aspect by a Fellowship of the British Academy, and in its applied aspect, by Fellowships of the American Association for Artificial Intelligence (AAAI), the Association for Computational Linguistics (ACL), and the Cognitive Science Society. In 2018, Steedman received the Lifetime Achievement Award of the ACL. In work at the Universities of Sussex, Warwick, Pennsylvania, and Edinburgh, Steedman has pioneered the use of CCG as a practical basis for statistical natural language processing applications involving theoretically and computationally challenging linguistic phenomena. He has also shown that the same class of grammatical rules and statistical models is both necessary and sufficient for the analysis of harmonic and rhythmic structure in music, and that both language and music have their its origins in pre-linguistic multi-agent action-planning. The impact of this work is evident in its adoption by industrial laboratories. His students are employed at Google, Facebook, DeepMind, Apple, and Amazon, as well as on the faculties of the world's leading universities.